Linear Discriminant Text Classification in High Dimension

نویسندگان

  • András Kornai
  • J. Michael Richards
چکیده

Linear Discriminant (LD) techniques are typically used in pattern recognition tasks when there are many (n >> 10) datapoints in low-dimensional (d < 10) space. In this paper we argue on theoretical grounds that LD is in fact more appropriate when training data is sparse, and the dimension of the space is extremely high. To support this conclusion we present experimental results on a medical text classification problem of great practical importance, autocoding of adverse event reports. We trained and tested LD-based systems for a variety of classification schemes widely used in the clinical drug trial process (COSTART, WHOART, HARTS, and MedDRA) and obtained significant reduction in the rate of misclassification compared both to generic Bayesian machine-learning techniques and to the current generation of domain-specific autocoders based on string matching.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparison of generalized linear discriminant analysis algorithms

Linear Discriminant Analysis (LDA) is a dimension reduction method which finds an optimal linear transformation that maximizes the class separability. However, in undersampled problems where the number of data samples is smaller than the dimension of data space, it is difficult to apply the LDA due to the singularity of scatter matrices caused by high dimensionality. In order to make the LDA ap...

متن کامل

Variable Selection as an Instance-Based Ontology Mapping Strategy

The paper presents a novel instance-based approach to aligning concepts taken from two heterogeneous ontologies populated with text documents. We introduce a concept similarity measure based on the size of the intersection of the sets of variables which are most important for the class separation of the instances in both input ontologies. We suggest a VC dimension variable selection criterion e...

متن کامل

A Novel Nonparametric Linear Discriminant Analysis for High-Dimensional Data Classification

Linear discriminant analysis (LDA) has played an important role for dimension reduction in patter recognition field. Basically, LDA has three deficiencies in dealing with classification problems. First, LDA is well-suited only for normally distributed data. Second, the number of features can be extracted are limited by the rank of between-class scatter matrix. Third, the singularity problem ari...

متن کامل

Improving Automatic Text Classification by Integrated Feature Analysis

SUMMARY Feature transformation in automatic text classification (ATC) can lead to better classification performance. Furthermore dimen-sionality reduction is important in ATC. Hence, feature transformation and dimensionality reduction are performed to obtain lower computational costs with improved classification performance. However, feature transformation and dimension reduction techniques hav...

متن کامل

Hyperspectral Dimension Reduction Using Global and Local Information Based Linear Discriminant Analysis

Hyperspectral image classification has become an important research topic in remote sensing. Because of high dimensional data, a special attention is needed dealing with spectral data; and thus, one of the research topics in hyperspectral image classification is dimension reduction. In this paper, a dimension reduction approach is presented for classification on hyperspectral images. Advantages...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001